Regex for Beginners ✳️
Published at Dec 22, 2024
What is Regex? 🤔
Regex is a tool for matching patterns in strings. The syntax for regex involves two forward slashes (/
) with the pattern in between, followed by optional flags that modify its behavior.
/pattern/flags
Flags 🚩
Flags change how regex behaves:
- Case Insensitive (
i
): Matches without considering case. - Global (
g
): Finds all matches instead of stopping at the first.
/the/gi
Matches “the” in any case, globally.
Literal & Metacharacters 🔡🔣
Literal Characters 🔡
Regex can match literal characters for instance the regex:
cat
Matches occurrences of “cat” in a string.
Metacharacters 🔣
Metacharacters are special characters with specific meanings:
*
(wildcard): Matches zero or more occurrences..
(dot): Matches any single character.
If you want to use the literal value of a metacharacter, escape it with a backslash (
\
).
Quantifiers 🧮
Quantifiers specify how many times a pattern should match:
*
: Zero or more times.+
: One or more times.?
: Zero or one time (optional).{n}
: Exactlyn
times.{n,}
: At leastn
times.{n,m}
: Betweenn
andm
times.
With * by matching zero times we can match an empty string. Because we literally “match” nothing
Examples 📝
matches
"a", "aa", "aaaaaaaaaaaaa" (many times) or an empty string
a*
matches
"a", "aa", "aaa", or "aaaa"
a+
matches
"aa", "aaa", or "aaaa"
a{2,4}
matches
"ha" or "hay"
hay?
Greedy 🤑 vs. Lazy Matching 😴
- Greedy Matching: Matches as much as possible.
- Lazy Matching: Matches as little as possible by adding
?
.
Examples 📝
When looking at the sentence:
The quick brown fox jumps over the lazy dog.
Greedy
matches
The quick brown fox jumps over the lazy do
T.*o
Lazy
matches
The quick bro
T.*?o
Bracket Expressions
Bracket expressions match specific characters:
[abc]
: Matches “a”, “b”, or “c”.[a-z]
: Matches any lowercase letter.[A-Z0-9]
: Matches uppercase letters or digits.[^abc]
: Matches anything except “a”, “b”, or “c”.
Example
[a-zA-Z]
matches any letter.[0-9]
matches any digit.
Character Classes
Shorthand for common patterns:
\d
: Matches digits ([0-9]
).\w
: Matches word characters ([a-zA-Z0-9_]
).\s
: Matches whitespace.\D
,\W
,\S
: Match the inverse.
Anchors
Anchors match specific positions in a string:
^
: Start of a string.$
: End of a string.\b
: Word boundary.
Example
^The
matches “The” at the start of a string.end$
matches “end” at the end of a string.
Groups and Alternation
- Capturing Groups: Use parentheses to group patterns.
- Example:
(fox|dog)
matches “fox” or “dog”.
- Example:
- Alternation: Use
|
for logical OR.- Example:
cat|dog
matches “cat” or “dog”.
- Example:
Lookaheads and Lookbehinds
- Lookahead: Matches based on what follows.
- Positive:
(?=...)
- Negative:
(?!...)
- Positive:
- Lookbehind: Matches based on what precedes.
- Positive:
(?<=...)
- Negative:
(?<!...)
- Positive:
Example
\d(?=px)
matches digits followed by “px”.(?<=\$)\d+
matches digits preceded by ”$“.
Escaping Special Characters
To match special characters literally, escape them with a backslash (\
).
Example
\.
matches a literal dot.\$
matches a literal dollar sign.
Practical Example: Matching an IP Address
Regex
d{1,3}(.d{1,3}){3}
Explanation
\d{1,3}
: Matches 1-3 digits.\.
: Matches a literal dot.{3}
: Repeats the previous group 3 times.
Combining Concepts
To explicitly match an IP address:
Use
^
and$
to anchor the pattern.Example:
^d{1,3}(.d{1,3}){3}$
Conclusion
Regex is a powerful tool for pattern matching. By combining concepts like quantifiers, groups, and anchors, you can create complex patterns to solve real-world problems.
If you found this tutorial helpful, leave a like or comment with your favorite regex use case. Stay curious and keep learning!